Software
This class will involve a number of computational homework assignments. These must be completed in quarto markdown files using the python language and be accompanied by a rendered pdf file. Teaching fluency in both R and python is an important goal of the program, and this course is devoted partially to building your python competency as it is a very standard language for machine learning applications.
How to get setup with python
The official course prerequisites include DATA 602, 606, 607, and 608. DATA 602 is our course on python and my assumption is that you are familiar with the material in that course. If you are not, it will still be possible to do well in this course, but you will need to be proactive at building your python knowledge. Here are some resources that can help:
- Basic Software Installation and Computing Environment
- Python 3 Installation There are many places to install python from, I recommend using
anaconda. I also suggest that you follow the directions in theISLPbook at the end of Chapter 2 to complete your setup. There are several package managers forpython, includingpipandconda. I use a combination of both, with the caveat that I use the fast version ofcondacalledmamba. - quarto If you have not already you should install
quartowhich is what you will use to complete your assignments (and potentially project) - python with quarto This provides a good basic guide of how to use
pythontogether with quarto using thepositronIDE - positron is a full featured IDE for both
pythonand `R that is tuned for data science. It is based on VS Code and is the environment that I recommend you use for this course. - posit.cloud If you have insufficient computing resources, contact me about setting up an account with my course posit.cloud instance
- [google colab](colab.google.com] This is another option if you need more computing resources.
- learn python tutorials This is a series of very basic interactive python tutorials
- python.org beginners guide
Once you complete those, you will want to have a basic understanding of the fundamental python libraries for working with data, graphics, and scientific computing.
- Hands on Machine Learning Tutorials Aurelien Geron, the author of the excellent Hands on Machine Learning books, has basic tutorials on
pandas,numpy, andmatplotlibon his website. He has a them implemented in google colab notebooks.
Guide to Key Packages:
- sklearn is the most commonly used machine learning library in python
- statsmodels is a useful package for statistics in python
- quarto Document and website authoring app. Used for your homework and maybe project
- pytorch is a package for neural networks in python
- numpy Basic numerical python package
- pymc is a basic package for Bayesian stats in python
- marginaleffects is a very useful package for interpreting and communicating machine learning models in python
- matplotlib is a powerful plotting library in python
- seaborn is another powerful plotting library, though technically powered by
matplotlib. - islp Python package with the datasets of our textbook
- conda A good python package and environment manager
- mamba A fast drop in replacement for conda’s package installer
- pandas Standard python package for wrangling data
- polars Blazingly fast data wrangling library intended for larger datasets
- shiny Made initially for
Rbut it works with python now. Deploy ML apps - streamlit Package for making lightweight apps
- FastAPI Another lightweight production library